12 research outputs found
Polar Codes with exponentially small error at finite block length
We show that the entire class of polar codes (up to a natural necessary
condition) converge to capacity at block lengths polynomial in the gap to
capacity, while simultaneously achieving failure probabilities that are
exponentially small in the block length (i.e., decoding fails with probability
for codes of length ). Previously this combination
was known only for one specific family within the class of polar codes, whereas
we establish this whenever the polar code exhibits a condition necessary for
any polarization.
Our results adapt and strengthen a local analysis of polar codes due to the
authors with Nakkiran and Rudra [Proc. STOC 2018]. Their analysis related the
time-local behavior of a martingale to its global convergence, and this allowed
them to prove that the broad class of polar codes converge to capacity at
polynomial block lengths. Their analysis easily adapts to show exponentially
small failure probabilities, provided the associated martingale, the ``Arikan
martingale'', exhibits a corresponding strong local effect. The main
contribution of this work is a much stronger local analysis of the Arikan
martingale. This leads to the general result claimed above.
In addition to our general result, we also show, for the first time, polar
codes that achieve failure probability for any
while converging to capacity at block length polynomial in the gap to capacity.
Finally we also show that the ``local'' approach can be combined with any
analysis of failure probability of an arbitrary polar code to get essentially
the same failure probability while achieving block length polynomial in the gap
to capacity.Comment: 17 pages, Appeared in RANDOM'1
Induced minors and well-quasi-ordering
A graph is an induced minor of a graph if it can be obtained from an
induced subgraph of by contracting edges. Otherwise, is said to be
-induced minor-free. Robin Thomas showed that -induced minor-free
graphs are well-quasi-ordered by induced minors [Graphs without and
well-quasi-ordering, Journal of Combinatorial Theory, Series B, 38(3):240 --
247, 1985].
We provide a dichotomy theorem for -induced minor-free graphs and show
that the class of -induced minor-free graphs is well-quasi-ordered by the
induced minor relation if and only if is an induced minor of the gem (the
path on 4 vertices plus a dominating vertex) or of the graph obtained by adding
a vertex of degree 2 to the complete graph on 4 vertices. To this end we proved
two decomposition theorems which are of independent interest.
Similar dichotomy results were previously given for subgraphs by Guoli Ding
in [Subgraphs and well-quasi-ordering, Journal of Graph Theory, 16(5):489--502,
1992] and for induced subgraphs by Peter Damaschke in [Induced subgraphs and
well-quasi-ordering, Journal of Graph Theory, 14(4):427--435, 1990]
General Strong Polarization
Arikan's exciting discovery of polar codes has provided an altogether new way
to efficiently achieve Shannon capacity. Given a (constant-sized) invertible
matrix , a family of polar codes can be associated with this matrix and its
ability to approach capacity follows from the {\em polarization} of an
associated -bounded martingale, namely its convergence in the limit to
either or . Arikan showed polarization of the martingale associated with
the matrix to get
capacity achieving codes. His analysis was later extended to all matrices
that satisfy an obvious necessary condition for polarization.
While Arikan's theorem does not guarantee that the codes achieve capacity at
small blocklengths, it turns out that a "strong" analysis of the polarization
of the underlying martingale would lead to such constructions. Indeed for the
martingale associated with such a strong polarization was shown in two
independent works ([Guruswami and Xia, IEEE IT '15] and [Hassani et al., IEEE
IT '14]), resolving a major theoretical challenge of the efficient attainment
of Shannon capacity.
In this work we extend the result above to cover martingales associated with
all matrices that satisfy the necessary condition for (weak) polarization. In
addition to being vastly more general, our proofs of strong polarization are
also simpler and modular. Specifically, our result shows strong polarization
over all prime fields and leads to efficient capacity-achieving codes for
arbitrary symmetric memoryless channels. We show how to use our analyses to
achieve exponentially small error probabilities at lengths inverse polynomial
in the gap to capacity. Indeed we show that we can essentially match any error
probability with lengths that are only inverse polynomial in the gap to
capacity.Comment: 73 pages, 2 figures. The final version appeared in JACM. This paper
combines results presented in preliminary form at STOC 2018 and RANDOM 201
Communication Complexity of Inner Product in Symmetric Normed Spaces
We introduce and study the communication complexity of computing the inner
product of two vectors, where the input is restricted w.r.t. a norm on the
space . Here, Alice and Bob hold two vectors such that
and , where is the dual norm. They want
to compute their inner product up to an
additive term. The problem is denoted by .
We systematically study , showing the following results:
- For any symmetric norm , given and
there is a randomized protocol for using
bits -- we will denote this by
.
- One way communication complexity
, and a nearly matching lower bound
for .
- One way communication complexity for a
symmetric norm is governed by embeddings into .
Specifically, while a small distortion embedding easily implies a lower bound
, we show that, conversely, non-existence of such an embedding
implies protocol with communication .
- For arbitrary origin symmetric convex polytope , we show
, where is the unique norm for which is a unit ball,
and is the extension complexity of .Comment: Accepted to ITCS 202
An Improved Lower Bound for Sparse Reconstruction from Subsampled Hadamard Matrices
We give a short argument that yields a new lower bound on the number of
subsampled rows from a bounded, orthonormal matrix necessary to form a matrix
with the restricted isometry property. We show that a matrix formed by
uniformly subsampling rows of an Hadamard matrix contains a
-sparse vector in the kernel, unless the number of subsampled rows is
--- our lower bound applies whenever . Containing a sparse vector in the kernel precludes not only
the restricted isometry property, but more generally the application of those
matrices for uniform sparse recovery.Comment: Improved exposition and added an autho
When Does Optimizing a Proper Loss Yield Calibration?
Optimizing proper loss functions is popularly believed to yield predictors
with good calibration properties; the intuition being that for such losses, the
global optimum is to predict the ground-truth probabilities, which is indeed
calibrated. However, typical machine learning models are trained to
approximately minimize loss over restricted families of predictors, that are
unlikely to contain the ground truth. Under what circumstances does optimizing
proper loss over a restricted family yield calibrated models? What precise
calibration guarantees does it give? In this work, we provide a rigorous answer
to these questions. We replace the global optimality with a local optimality
condition stipulating that the (proper) loss of the predictor cannot be reduced
much by post-processing its predictions with a certain family of Lipschitz
functions. We show that any predictor with this local optimality satisfies
smooth calibration as defined in Kakade-Foster (2008), B{\l}asiok et al.
(2023). Local optimality is plausibly satisfied by well-trained DNNs, which
suggests an explanation for why they are calibrated from proper loss
minimization alone. Finally, we show that the connection between local
optimality and calibration error goes both ways: nearly calibrated predictors
are also nearly locally optimal
A Unifying Theory of Distance from Calibration
We study the fundamental question of how to define and measure the distance
from calibration for probabilistic predictors. While the notion of perfect
calibration is well-understood, there is no consensus on how to quantify the
distance from perfect calibration. Numerous calibration measures have been
proposed in the literature, but it is unclear how they compare to each other,
and many popular measures such as Expected Calibration Error (ECE) fail to
satisfy basic properties like continuity.
We present a rigorous framework for analyzing calibration measures, inspired
by the literature on property testing. We propose a ground-truth notion of
distance from calibration: the distance to the nearest perfectly
calibrated predictor. We define a consistent calibration measure as one that is
polynomially related to this distance. Applying our framework, we identify
three calibration measures that are consistent and can be estimated
efficiently: smooth calibration, interval calibration, and Laplace kernel
calibration. The former two give quadratic approximations to the ground truth
distance, which we show is information-theoretically optimal in a natural model
for measuring calibration which we term the prediction-only access model. Our
work thus establishes fundamental lower and upper bounds on measuring the
distance to calibration, and also provides theoretical justification for
preferring certain metrics (like Laplace kernel calibration) in practice.Comment: In STOC 202
Induced minors and well-quasi-ordering
International audienceA graph H is an induced minor of a graph G if it can be obtained from an induced subgraph of G by contracting edges. Otherwise, G is said to be H-induced minor-free. Robin Thomas showed in [Graphs without K 4 and well-quasi-ordering, Journal of Combinatorial Theory, Series B, 38(3):240 – 247, 1985] that K 4-induced minor-free graphs are well-quasi ordered by induced minors. We provide a dichotomy theorem for H-induced minor-free graphs and show that the class of H-induced minor-free graphs is well-quasi-ordered by the induced minor relation if and only if H is an induced minor of the gem (the path on 4 vertices plus a dominating vertex) or of the graph obtained by adding a vertex of degree 2 to the complete graph on 4 vertices.Similar dichotomy results were previously given by Guoli Ding in [Subgraphs and well-quasi-ordering, Journal of Graph Theory, 16(5):489–502, 1992] for subgraphs and Peter Damaschke in [Induced subgraphs and well-quasi-ordering, Journal of Graph Theory, 14(4):427–435, 1990] for induced subgraphs